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Ignorance of the form new physics will take suggests the importance of systematically analyzing 
all data collected at the energy frontier, with the goal of maximizing the chance for discovery 
both before and after the turn on of the LHC. 



a.: 

' 1 The Game 



Which of the many possible extensions to the Standard Model should you spend your time trying 
to detect? The standard American technique for finding an answer to such a difficult question 
is to conduct a poll, since averaging over the responses of many uninformed people has been 
shown to produce useful insight. In this spirit, and in consultation with Gallup, a poll has been 
constructed for contemporary high energy physics. The first question of this poll is: 

1. The first sign of new physics will come from (check one): 

□ Heavy gauge bosons □ Technicolor 

□ Large extra dimensions □ Leptoquarks 

□ Supersymmetry □ a fourth generation of fermions 

□ Compositeness □ Something else 
The result of this poll, averaging over the responses of several hundred professional physicists, 
is provided in the transparencies on which these proceedings are basedM^ As with most polls, 
the result of this one is not particularly informative, but it is nonetheless mildly interesting. 
Although none of the options wins the majority, supersymmetry carries the plurality. 

Assuming Nature, like most Western democracies, is governed by public opinion, she has 
presumably chosen to be supersymmetric. In the interest of elegance and bureaucratic efficiency, 
she has doubtlessly also chosen to be minimally supersymmetric. In this restrictive case, the 
second question of the poll is: 

2. What are the values of the 105 parameters of the MSSM? 



Your local state lottery provides a larger payout at better odds. 

How will a discovery look in an Easter egg hunt involving 1000 physicists each testing one 
of 1000 different models? Although each random member of each kilophysicist collaboration 
will guess wrong, suppose one experimentalist is very lucky and guesses close. To understand 
what "close" means, consider a space of two observables, with a true signal indicated by some 
clustering of events in this space. The physicist takes a particular model from a theoretical 
friend, blinds himself to the data, chooses some region in this space of observables based on 
that model, compares the number of events seen in that region to the number expected from 
Standard Model processes, and finds an excess corresponding to 3.5 standard deviations. He 
then looks at the rest of the data, and realizes that if he had chosen a slightly different model, 
he could have found an excess corresponding to 6.5 standard deviations. Sticky questions of 
interpretation naturally ensue. Any "successful" dedicated search for new physics is bound to 
end up in a set of highly sculpted cuts, since the original guess is bound to be wrong. 

The development of a systematic framework within which the entirety of the frontier energy 
collider data can be understood could be an important play both in the short term game of 
maximizing the field's chance for discovery before the turn on of the LHC, and in the medium 
term game of maximizing the field's chance for discovery after the turn on of the LHC. 

2 Strategy and Tactics 

The first prong of this systematic framework is an algorithm called Vista. Vista is a model- 
independent search for new large cross section physics, designed for understanding the gross 
features of the data and the bulks of distributions. Basic physics objects are defined: electrons, 
muons, taus, photons, jets, 6-jets, and missing energy; all high-p^ events are collected; the 
contribution from all Standard Model processes are estimated, turning event generators into a 
virtual collider; the detector response is simulated; and experimental and theoretical fudge fac- 
tors are systematically fit. The data and all Standard Model backgrounds are then partitioned 
into exclusive final states. The number of events in each final state is noted and compared to 
prediction, and the discrepancy is quantified in units of standard deviations, taking into account 
the several hundred different final states considered. This list is ordered according to decreasing 
discrepancy, and each day's task is to understand the discrepancies at the top of the list. Supple- 
menting this single-page vista of the data landscape are ten thousand automatically generated 
plots showing a comparison between data and Standard Model prediction in all relevant kine- 
matic distributions, with differences in shape quantified using the simple KS statistic, taking 
into account the several thousand different distributions considered. 

The second prong of this framework is an algorithm called Sleuth. Sleuth is a quasi- 
model-independent search strategy for new high-py, small cross section physics. The bulks 
of distributions having been understood using Vista, Sleuth focuses on identifying excesses 
on the high-pr tails. Sleuth rigorously computes the infamous trials factor, quantifying the 
fraction of hypothetical similar experiments in which one would see something as interesting as 
what one actually sees in the data. This has been achieved s o far on ly at two of the frontier 
energy collider experiments. The first was SLEUTH@D0Runl) , ^ in which roughly thirty 
final states were analyzed in a quasi-model-independent way for new physics in Tevatron Run 
I. The more recent and exhaustive HI General Searc represents the first time in the last 
thirty years that a frontier energy collider experiment has completely understood its high-p^ 
data. SLEUTH@TevatronRunII will closely resemble the HI General Search, which significantly 
improves and simplifies the region-searching algorithm used in SLEUTH@D0RunI. 

The third prong of the framework is an algorithm called Quaero (Latin for "I search for, I 
seek"). Quaero is an algorithm designed for a rigorous, fast, transparent, robust, and model- 
dependent testing of specific hypotheses. The user interface is viewable online^ A physicist 



should be able to provide the events her hypothesis predicts should be produced in each of the 
frontier energy colliders. Quaero is designed to handle the details of testing that hypothesis 
against the frontier energy collider data, taking into account expert collaboration-specific knowl- 
edge. Quaero returns a single number quantifying the extent to which the data (dis)favors the 
hypothesis relative to the Standard Model, and figures showing how the analysis was performed. 

The knowledge of each experiment's detector response is encapsulated in TurboSim, a fast 
simulation that tunes itself to an experiment's full simulation. Events run through the full 
simulation are used to construct a gigantic lookup table mapping the outgoing legs of Feynman 
diagrams to reconstructed objects in the detector. The price of an additional ~ 10% systematic 
uncertainty buys a speedup of ~ 10 3 and a decoupling from each experiment's offline framework. 

The fourth prong of the framework is an algorithm called Bard, a procedure for model 
construction intended to automate model builders. Starting with a particular hint from Vista 
or Sleuth, Bard systematically generates many possible signals, introducing new particles and 
couplings as necessary in order to explain what is seen. Bard uses Quaero to determine model 
parameters and to quantify the goodness of fit. The output of Bard is a reasonably exhaustive 
list of possible new physics interpretations, ordered according to how well each one fits the data. 

The current scorecard of this project is shown in Table 1. 

3 Implications 

These proceedings conclude with a few mildly provocative thoughts. 

If CMS vigorously pursues an approach similar to that described here, while 1000 ATLAS 
physicists divide themselves among 1000 different models, CMS will win. 

An experiment's ability to make its data publicly available is an acid test of that experiment's 
understanding of its data. To the extent that a collaboration really understands its data and 
detector response, making the data available in a generally useful form is trivial. To the extent 
that a collaboration lacks a coherent, self-consistent picture of its data and detector response, 
making the data available for multipurpose use is nearly impossible. 

HIP has overtaken HEP. The first two prongs of our framework (Vista and Sleuth) are 
designed to find discrepancies in the data worth pursuing. Having been unsuccessful in finding 
such discrepancies in frontier energy collider physics for the past quarter century, the possibility 
of seeing such discrepancies at the LHC in a few years' time is eagerly anticipated. Heavy 
ion physics is already awash in discrepancies begging to be understood. The construction of a 
QuAERO-like interface to the RHIC data, and of a BARD-like method for systematically exploring 
the space of possible models, represents a novel and potentially fruitful way to proceed. 

By algorithmatizing the systematic identification of discrepancies, the testing of different 
hypotheses against the data, and the construction of new models, we are beginning to automate 
the scientific method in the narrow field of frontier energy collider physics. It will be interesting 
to see whether this approach can be successfully generalized to other subfields of science. 

4 Summary 

This discussion rests on the realization that guessing right is impossible — the physics seen 
at the TeV scale is guaranteed to be a surprise. In light of this, a few of us are pursuing 
a systematic analysis of all frontier energy collider data. Vista allows a model-independent 
search for new large cross section physics; Sleuth enables a quasi-model-independent search 
for new high-p-p, small cross section physics; Quaero provides a model-dependent automation of 
hypothesis testing; and Bard provides a procedure for model construction to aid in interpreting 
the underlying physics. The point of this approach is to systematically maximize the chance for 
discovery, both before and after the turn on of the LHC. 



Table 1: Present project status. The columns below list project steps; rows show experiments. References are 
provided to steps that are completed and published, shown as /. Steps that are technically complete but not yet 
published are shown as ; ongoing work is shown as •; stalled efforts are shown as ♦. Steps with two displayed 
symbols represent efforts in Tevatron Runs I and II. No collaboration commitments are expressed or implied. 
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